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1.  INTRODUCTION 


This  paper  considers  the  assignment  of  weights  for  neural  networks  with  one 
hidden  layer  so  that  the  network  interpolates  through  a  given  finite  set  of  input- 
output  points  with  low  sensitivity  to  noise  in  the  input  patterns.  The  sensitivity 
at  the  input  patterns  is  minimized  by  minimizing  the  derivative  of  the  input- 
output  map  of  the  interpolating  network  at  the  input  patterns.  The  idea  is  that  if 
the  derivative  at  an  input  point  is  small,  then  a  small  variation  around  that  point 
will  produce  a  small  variation  in  the  output.  This  idea  is  made  precise  in  the 
next  section. 

The  approach  presented  here  gives  a  direct  method  for  determining  the 
weights.  The  input-output  map  defined  by  these  weights  interpolates  through 
the  given  set  of  points  exactly  and  the  derivative  at  the  input  patterns  can  be 
made  arbitrarily  small.  The  inversion  of  a  nonsingular  matrix  is  required  for 
exact  interpolation.  If  the  exact  interpolation  requirement  is  relaxed,  then  the 
inversion  of  that  matrix  can  be  circumvented.  It  is  possible  to  determine  weights 
so  that  the  network  approximately  interpolates  through  the  given  set  of  points 
with  any  desired  degree  of  accuracy  and  with  a  sensitivity  as  small  as  desired. 
Both  the  accuracy  of  interpolation  and  the  sensitivity  to  noise  are  controlled  by 
the  size  of  the  weights  in  the  first  layer  of  weights.  Estimates  on  how  large  these 
weights  have  to  be  to  achieve  a  desired  interpolation  accuracy  and  noise 
sensitivity  are  also  presented,  as  well  as  an  algorithm  for  determining  the 
weights. 

Other  authors  have  studied  direct  methods  for  weight  assignment.  By  direct 
methods  we  mean  nonrecursive  methods;  that  is,  methods  that  determine  the 
weights  as  a  well-defined,  explicit  function  of  the  input-output  pairs  to  be 
implemented.  In  Reference  1  it  is  shown  how  to  approximately  interpolate,  with 
any  desired  degree  of  accuracy,  through  2m- 1  points  with  a  networic  that  has  m 
neurons  in  the  hidden  layer  and  sigmoidal  activation  functions.  It  is  also  known 
that  one  can  exactly  interpolate  through  m+l  points  with  a  network  that  has  m 
neurons  in  the  hidden  layer  and  different  types  of  activation  functions  (see  for 
instance  References  2  through  4).  Here,  the  interpolation  is  through  m+1  points 
using  a  network  with  m  neurons  in  the  hidden  layer.  The  interpolation  is  done 
in  such  a  way  that  the  derivative  at  the  points  of  interpolation  can  be  controlled 
and  can  be  made  arbitrarily  small.  The  input  and  output  spaces  can  be 
multidimensional. 

The  weight  assignment  techniques  for  approximate  interpolation  can  be 
applied  to  find  a  good  set  of  initial  weights  for  problems  that  involve  learning 
more  points  than  the  degrees  of  freedom  of  the  net.  This  can  be  an  important 
application,  since  the  speed  of  convergence  of  iterative  learning  algorithms  is 
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well  known  to  depend  severely  on  the  choice  of  initial  weights  (References  5  and 

6). 


The  notation  required  to  present  these  results  is  introduced  in  Section  2, 
where  we  also  state  one  of  our  main  results  and  some  preliminary  results.  In 
Section  3,  we  define  weights  and  biases  for  a  family  of  neural  networks  that  solve 
the  exact  interpolation  problem.  The  family  is  parametrized  by  a  vector  w  e  R*" 
whose  size  controls  the  derivative  of  the  input-output  map  at  the  interpolation 
points.  We  show  that  these  derivatives  can  be  made  arbitrarily  small  by 
increasing  the  components  of  the  vector  w.  Approximate  interpolation  with  small 
sensitivity  is  addressed  in  Section  4.  An  algorithm  for  approximate 
interpolation  with  low  sensitivity  is  presented  and  illustrated  with  simple 
examples.  Some  of  the  more  tedious  proofs  are  relegated  to  the  Appendix. 


2.  NOTATION  AND  STATEMENT  OF  MAIN  RESULTS 


We  shall  consider  feed-forward  neural  networks  with  one  hidden  layer 
consisting  of  m  neurons,  each  of  which  has  a  nonlinear  activation  function  that 
will  be  denoted  by  S.  The  activation  function  S  is  assumed  to  be  a  continuous 

function  mapping  the  real  line  R  into  the  open  interval  (-1,  1)  with  lim  S(t)  =  ±1. 

t— 

For  an  m-vector  y  with  components  yi,  y2 . y™.  we  can  define  the  m- 

dimensiona!  sigmoid  Sm  by  the  formula 


'S(yi) 

S(y2) 


s.(y)  = 


(yeR™)  . 


The  collection  of  k  by  ^  real  matrices  is  denoted  by  R*^*^  and  the  space  of  k- 
dimensional  real  vectors  is  denoted  by  R*^,  where  k  and  ^  ate  any  two  positive 
integers.  If  the  network  has  n  inputs  and  ^  outputs,  then  the  transfer  function 
(input-output  map)  of  the  network  is  a  function  F  :  R"  R^  of  the  form 


F(Z)  =  Co  +  a  (WZ  +  P)  (Z  6  R") 
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where  W  e  Rmxn  represents  the  first  layer  of  weights,  a  e  represents  the 

second  layer  of  weights,  and  the  vectors  p  e  R”'  and  Oq  e  R^  are  bias  vectors. 

If  S  is  differentiable,  then  Sm  is  differentiable  as  well  as  F.  Let  S'm  and  F' 

denote  the  derivatives  of  Sm  and  F,  respectively.  Then  S'm  :  R™  -»  and  F'; 

Rn  Rftin  arc  given  by 


S;„(y)  =  diag  (S'(yi),  S'(y2),  S’(ym))  (y  e  R"*) 


F(Z)  =  a  S;„(WZ  +  p)W 


(ZeR")  , 


(2.1) 


where  S'  denotes  the  derivative  of  S,  and  diag(S'(yi) .  S'(ym))  is  a  diagonal 

matrix  with  S'(yi),  S'(y2),  ....  S'(ym)  along  the  diagonal.  Note  that  the  ij^^ 

dF* 

component  of  the  ^  x  n-matrix  F'(Z)  is  given  by  [F'(Z)]jj  =  5^(Z),  where  Fj  is  the  P*’ 
component  of  F  and  Zj  is  the  component  of  Z  (1  <  i  <  ^.  1  <  j  <  n). 


Remark  2.1.  For  a  vector  valued  function  F  :  R"  such  as  the  one 

above,  the  (total)  derivative  F'(Z)  of  F  at  Z  e  R"  is,  by  definition  (see  Reference  7 
or  8),  a  linear  transformation  from  R"  to  R^  satisfying 


lim 

h-^o 


11  F(Z  +  h)-F(Z)-F'(Z)h  11 
II  h  II 


where  II  •  II  denotes  the  underlying  vector  norm.  This  means  that  for  every  E  >  0 
there  exists  a  6  >  0  such  that  if  II  h  II  <  5,  then 

11  F(Z  +  h)-F(Z)-F’(Z)h  ||  <e  ||hll  . 


For  II  h  11  <  5,  the  above  inequality  implies 


11  F(Z  +  h)  -  F(Z)  II  ^  II  F  ’  (Z)  h  II  +  e  II  h  II  <  [II  F’  (Z)  II  +  e]  II  h  II  .  ^2.2) 
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Inequality  2.2  has  a  significant  interpretation.  If  Z  represents  a  fixed  input  to 
the  network  with  desired  output  F(Z)  and  h  represents  a  small  (II  h  II  <  6) 
perturbation  to  the  exact  input  Z,  then  Inequality  2.2  asserts  that  the  output 
F(Z  +  h)  to  the  perturbed  input  (Z  +  h)  will  be  within  a  distance  8[li  F'  (Z)  II  +  E]  of 
the  desired  output  F(Z).  Therefore,  by  making  II  F'  (Z)  II  small,  the  output  of  the 
network  to  an  input  perturbed  by  noise  will  remain  close  to  the  desired  output. 

////* 


Given  a  finite  set  of  input-output  pairs. 


Q  =  {(xi,yi)  €  R"xR^:0<i<m  and  Xj  ^  Xj  when  i  ^  j} 


we  shall  say  that  F  interpolates  through  £1  if  F(xi)  =  y,  for  i  =  0,  1,  2,  ...,  m. 

Throughout  this  paper,  the  matrix  W  of  first  layer  of  weights  will  be  given  by 
an  outer  product  W  =  wv,  where  v"^  e  R"  will  be  fixed  and  chosen  so  that  vxo  <  vxj 
<  ...  <  vxm.  The  m-vector  w  will  belong  to  an  unbounded  open  subset  G  of  R™. 
The  second  layer  of  weights  matrix  a  and  the  bias  vectors  Oq  and  P  will  be  defined 
as  functions  on  G.  Thus,  we  shall  define  (in  the  next  section)  functions  a  :  G 
R^xm  p  ;  G  R*",  and  Oq  '.  G  ->  R^,  and  a  family  of  neural  networks  F^  (w  e  G)  of 
the  form 


Fv^,(z)  =  ao(w)  +  a(w)  Sn,(wvz  -i-  P(w))  (z  e  R") 


(2.3) 


such  that  Fw  interpolates  through  Q  for  every  w  e  G.  Moreover,  under  certain 
conditions  on  the  sigmoid  S,  the  function  P  :  G  ^  R™  can  be  chosen  so  that 

lim  Fw(xj)  =  0  for0^i<m  . 

W— ^ 


The  notation  w  -»  •  means  that  wj  ^  ®  for  all  i  =  1,  2 . m,  where  wi  (1  <  i  < 

m)  are  the  components  of  w  e  R™-. 

These  results,  when  combined  with  Remark  2.1,  show  that  there  exist  neural 
networks  that  interpolate  through  the  set  Q  with  an  arbitrarily  small  sensitivity  to 
noise  at  the  inputs  xj  (0  <  i  <  m). 


*  The  symbol  ////  indicates  the  end  of  a  proof,  an  example,  or  a  remark. 
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3.  WEIGHT  ASSIGNMENT  FOR  EXACT  INTERPOLATION  AND 

DERIVATIVE  CONTROL 


In  this  section  we  shall  define  a  set  G  in  R"^  and  a  family  of  neural  networks 
Fyf,  :  R"  -»  R^  (w  €  G)  of  the  form 


Fv^,  (x)  =  ttoCw)  +  a(w)  Sm(wvx  +  P(w))  (x  6  R") 


(3.1) 


such  that  Fw  interpolates  through  Q  for  every  w  e  G.  To  solve  the  interpolation 
problem,  it  is  required  only  that  the  activation  function  S  ;  R  -»  (-1,  1)  be 

continuous  with  Um  S(t)  =  ±  1. 

t— »±«» 

Next  we  will  show  that  if  the  sigmoid  S  satisFies  certain  conditions  for 
derivative  control,  then  the  bias  vector  function  ^  ;G->R'”  can  be  defined  in  such 
a  way  that  the  derivative  of  Fw  can  be  made  arbitrarily  small  at  the  points  of 
interpolation. 

3.1.  EXACT  INTERPOLATION 

Let  a  set  of  interpolation  points  fl  =  {(xi,  yi)  e  R"  x  R^  :  0  <  i  ^  m  and  xj  *  xj 
when  i  be  given. 

The  first  stei  is  to  find  a  vector  v^  e  R”  such  that 


VXq  <  VXj  <  VX2  <  ...  <  VXm 


(3.2) 


Such  a  vector  vT  always  exists  as  asserted  by  the  next  lemma;  however,  we  may 
have  to  relable  the  x,  (0  <  i  <  m). 

Lemma  3.1.  Given  distinct  points  xo,  xj,  X2,  ....  x^  in  R",  there  exists  a 
vector  vT"  €  R  such  that  {vxj :  0  <  i  <  m)  is  a  set  of  distinct  numbers. 

This  lemma  is  proved  in  the  Appendix. 

Note  that  v  denotes  a  row  vector,  while  its  transpose  v^  denotes  a  column 
vector  in  R". 
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Given  vT  e  R"  satisfying  Inequalities  3.2,  we  define  W  =  wv  with  w  e  R™. 
Next,  one  selects  any  m  continuous  functions  :  (t]^  ,oo  ^  — >  (0,  )  (1  <  k  <  m) 
that  grow  slower  than  linear;  that  is,  they  satisfy  the  Growth  Condition 


lim  [ta  +  Aic(t)]  = 


+  oo  if  a  >  0 
-  oo  if  a  <  0 


(1  <  k  <  m) 


(3.3) 


For  example,  A^ft)  =  (t  -  t^)^  for  t  >  t^  and  0  <  £  <  1,  k  =  1,  2,  ...,  m. 


m 


The  bias  vector  function  P  is  defined  on  the  open  set  X=  (tj^,  <»)  c  R"*.  If 
w  s  [wj,  W2,  ...,  Wm]Tc  X,  then  the  i‘^  component  of  P(w)  is  given  by 


pj(w)  s  Aj(Wi)  -  WjVXj 


(w  e  X,  1  <  i  <  m)  . 


(3.4) 


To  simplify  the  notation,  let  Lw  :  R"  R"’  denote  the  affine  transformation 
defined  for  each  w  e  X  by 


L^(x)  s  wvx  +  P(w)  (x€  R")  . 


Note  that  for  each  w  e  X,  Lw  is  the  transformation  between  the  input  layer  and  the 
hidden  layer.  By  Equation  3.4,  the  i^**  component  of  L^fx)  can  be  written  as 


[L^(x)]i  =  WjV  (x  -  Xj)  +  Ai(wi)  (x  e  R" ,  1  i  <  m)  . 

Let  A(w)  denote  the  m  x  m  matrix  whose  column  equals 
Sn,(Lw(xj))  -  Sn,(L^(xj_i))  (1  <  j  <  m  ,  w  e  X)  . 


Note  that  since  the  sigmoid  S  and  the  functions  Ai  (1  ^  i  <  m)  are  continuous, 
the  matrix  valued  mapping  w  A(w)  defines  a  continuous  map  A  :  X  ^ 
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The  set  G  is  defined  to  be  the  collection  of  all  vectors  w  e  X  for  which  A(w)  is 
an  invertible  matrix.  We  shall  see  shortly  that  G  is  an  unbounded  open  set  in 

Rm. 


The  matrix  valued  function  a  :  G  defined  by 

a(w)sYA~kw)  (w€G)  ,  (3.7) 


where  Y  =  [yi  -  yo  :  y2  “  Yi  :  -  :  Ym  “  Ym-iJ  ^  R*'"'  • 
Finally,  Oq  :  G  is  defined  as 


ao(w)  s  y^-  a(w)  (L^CXq))  (w  e  G)  . 


(3.8) 


Our  first  theorem  shows  how  this  construction  solves  the  interpolation 
problem.  It  also  shows  why  it  suffices  to  have  m  neurons  in  the  hidden  layer  to 
interpolate  through  (m+1)  points. 

Theorem  3.1.  For  each  w  e  G.  the  layered  neural  network 


F^(x)  s  a^Cw)  +  a(w)  Sm(L^(x))  (x  e  R”) 


interpolates  through  Q. 

Proof.  The  proof  is  by  induction.  Fix  w  e  G.  The  definition  of  ao(w) 
(Equation  3.8)  clearly  implies  Fw(xo)  =  yo-  Assume  that  Fw(X|t)  =  y^  for  0  <  k  <  m. 
If  eic+i  denotes  the  (k+l)*^  column  of  the  mxm  identity  matrix,  then 


Fy^Cxj^^i)  —  F^(X|^)  —Y  A  (w)  [S„,(L^(xjj^j))  Sn,(Ly^,(xjj))] 


=  Y  =  yic+i  -  Yk  • 


Here  we  used  the  definition  of  a(w)  (Equation  3.7)  and  the  definition  of  the 
(k+l)s‘ column  of  A(w)  (see  Expression  3.6).  It  follows  that  Fw(xic+i)  =  yic+i.  This 
completes  the  proof. 
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For  the  construction  above  to  work,  it  is  essential  that  A(w)  be  invertible  for 
some  values  of  w.  This  is  guaranteed  by  the  next  proposition,  which  is  a 
consequence  of  the  Growth  Condition  3.3  and  the  asymptotic  properties  of  S. 


Proposition  3.1.  lini  A(w)  =  2Iin.  where  Im  denotes  the  mxm  identity 

W— >«» 

matrix.  Consequently,  G  is  an  unbounded  open  subset  of  R™.  More  precisely, 

tn 

there  exist  T.  >.  t.  large  enough  (1  <  k  ^  m)  such  that  the  product  IT  (T.  , «»)  is 


‘k  -  ‘k 

contained  in  G. 


k=l 


Recall  that  lim  means  that  w.  -»  oo  for  all  i  =  1,  2 . m. 


Proof.  If  U  denotes  the  collection  of  all  invertible  matrices  in  R™*™,  then  U 
is  an  open  set  containing  2In,.  Therefoie,  if  the  above  limit  holds,  then  U  contains 

m 

A(w)  for  all  w  large  enough.  This  implies  that  G  contains  the  product 

for  Tk  large  enough  (I  <  k  <  m).  Moreover,  since  A  :  X  Rmxm  jg  continuous,  G  = 
A'*(U)  is  open  in  X,  hence  open  in  Rmxm 

To  prove  that  A(w)  converges  to  2Iin  for  large  w,  let  Aij(w)  denote  the  ij^^ 
entry  of  A(w),  1  <  i  <  m,  1  <  j  <  m.  Equation  3.5  and  Expression  3.6  give 


Aij(w)  =  S(wiv(xj  -  Xj)  +  Ai(wi))  -  S(wiv(xj_j  -  Xj)  +  AjCwj)) 


S(Ai(wi))  -  S(wiv(xi_i  -  Xj)  +  AjCwj)) 
S(Wiv(xi+i  -  Xj)  +  Ai(Wj))  -  S(Ai(wi)) 


ifj  =  i 
if  j  =  i  +  1 


[S(Wiv(xj  -  Xj)  +  AjCwi))  -  S(wiv(xj_j  -  Xj)  +  AjCwj))  if  j  <  i  or  j  >  i  +  1 


Hence  it  follows  from  the  choice  of  v  (Inequalities  3.2),  the  asymptotic  properties 
ofS,  and  the  Growth  Condition  3.3  that 


lim  A;j(w) 

W— J 


2ifj  =  i 
Oifj^ti  . 


This  completes  the  proof  of  the  proposition. 


//// 
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Remark  3.1.  The  Growth  Condition  3.3  on  the  functions  Ak  (1  ^  k  <  m) 
was  instrumental  in  the  proof  of  Proposition  3.1,  which  hinges  on  the  fact  that 
A(w )  converges  to  an  invertible  matrix  as  w  -»  <»,  and  this  guarantees  the 
invertibility  of  A(w)  for  w  in  the  unbounded  set  G.  It  should  be  pointed  out  that 
if  one  is  only  interested  in  solving  the  interpolation  problem,  then  one  may  do 
without  the  Growth  Condition  3.3  and  replace  the  functions  by  arbitrary 
constants.  If  Ak  are  constants  (1  m),  then  A(w)  still  converges  to  an 

invertible  matrix  M  as  w  -»  oo.  Thus,  Proposition  3.1  will  hold  for  arbitrary 
constants  Ak  if  21ni  is  replaced  by  the  matrix  M,  which  has  the  form 


ai  bi  0  ...  0 

0  Si2  ^2  •“  0 


0  0  ...  an,_i  b^_j 

0  0  ...  0 


with  ajt  s  1  +  S(A|.)  and  bk  s  1  -  S(Ak)  (1  <  k  <  m).  Note  that  a^  2  and  b^  -»  0  if 
Ak  ->  «>  (l<k<  m).  Of  course.  Theorem  3.1  holds  whenever  A(w)is  invertible, 
independent  of  what  the  limit  of  A(w)  might  be  as  w  It  is  because  the  Growth 

Condition  3.3  will  be  required  to  control  the  derivative  of  Fw  at  xj  (0  <  i  ^  m)  that 
we  chose  to  present  this  approach  for  solving  the  interpolation  problem. 
Moreover,  the  fact  that  A“^(w)  -» j  Im  as  w  »  will  lead  to  a  simple  formula  for 

the  weight  matrix  a;  namely,  2  Y,  which  will  solve  the  interpolation  problem 
approximately  without  matrix  inversions.  //// 


3.2.  DERIVATIVE  CONTROL 

To  control  the  limiting  behavior  of  F’w  at  the  points  of  interpolation  as  w  -»  00, 
we  require  that  the  functions  Ak  approach  infinity  as  w  -»  eo  in  a  particular  way. 
Given  r^  >  0,  we  assume  that  the  functions  Ak  :  (tn,  «>)  -^  (0,  <»)  (1  <  k  ^  m)  in 
Equation  3.4  satisfy  the  Growth  Condition  3.3  and  the  two  conditions  below: 


lim  t  S'(Ai.(t))  =  0 

t-x- 

t  S’(Ak(t))  = 


if  Tk  =  0 

for  t  >  tk  if  Fk  >  0 


(3.9) 


I  1 


NWC  TP  7191 


lim  t  S'(at  +  Ak(t))  =  0  for  a  #  0  and  1  <  k<  m 

(3.10) 


These  conditions  for  derivative  control  are  satisfied  by  a  vast  class  of 
differentiable  sigmoids.  This  class  includes  commonly  used  sigmoids  such  as  the 
hyperbolic  tangent,  for  which 


Ak(t)  =  cosh*^  [V  t/rk  ] 


t  >  rk 


when  rk  >  0  , 


and 


Ak(t)  =  cosh'*  [Vt*+®]  ,  t  >  1  ,  for  any  e  >  0  when  rk  =  0 


Another  example  is  the  inverse  tangent  S(t)  =  ^  (0,  (te  R),  for  which 

Ak(t)  = 'V(2t/7trk)-l  ,  t  >  Jtrk/2  ,  when  rk  >  0  , 

and 

Ak(t)  =  V(2t3/2/Jt)-l  .  t  >  (n/2)2/3  .  when  rk  =  0  . 

For  the  logistic  sigmoid, 
t 

S(t)  =  -4  J c-*^dx- 1  ,  (t  e  R) 
yn  -oo 

Ak(t)  =  [fti  2t  -  fti  Vii  rk]  ^  t  >  Vit  tJ2  ,  when  rk  >  0  , 


and 


Ak(t)  =  (At  ^  t  > ,  for  any  e  >  0  ,  when  rk  -  0  . 

yn 


It  is  not  hard  to  show  that  the  examples  above  satisfy  Conditions  3.3,  3.9, 
and  3.10  (see  Remark  3.2). 

When  the  derivative  of  the  sigmoid  is  strictly  decreasing  on  some  infinite 
interval  of  the  positive  real  axis  as  in  the  examples  above,  then  the  derivative  S' 
of  the  sigmoid  is  invertible  on  that  interval.  Thus,  one  may  solve  the  equation 
t  S’(Ak(t))  =  rk  to  obtain  a  unique  function  Ak  for  each  rk  >  0.  If  (S')**  denotes 
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the  inverse  of  S'  on  an  appropriate  domain,  then  Ak(t)  =  (S')‘^  (fk/t)  for 
appropriate  values  of  t.  When  rk  =  0,  one  simply  chooses  a  decaying  function  f 

such  that  f(t)  -»  0  as  t  and  solves  t  S’(A|c(t))  =  f(t)  to  get  Ak(t)  =  (S’)‘*  (‘I'fCt)) 

for  t  in  an  appropriate  domain.  For  example,  f(t)  =  t'^  (t  >  0),  with  e  >  0 
judiciously  chosen  so  that  Aj^  satisfies  the  Growth  Condition  3.3. 

The  following  lemma  sheds  light  on  some  relationships  that  exist  among 
Conditions  3.3,  3.9,  and  3.10  under  certain  assumptions  on  the  sigmoid  S. 

Lemma  3.2.  Suppose  that  S  ;  R  -»  (-1,  1)  is  a  differentiable  odd  function 
with  S'  nonincreasing  on  (0.  <»).  Suppose  that  for  every  rk  ^  0  there  exists  tk  >  0 
and  Ak  :  (tk.  «)  -»  (0,  «)  such  that  Condition  3.9  and  Condition  3.3  with  a  <  0 
hold.  Then  Condition  3.3  holds  for  a  =  0  (and  all  a  >  0)  and  Condition  3.10  holds 
for  all  a  ^  0. 

The  proof  may  be  found  in  the  Appendix. 

Remark  3.2.  It  should  be  clear  from  Lemma  3.2  and  the  observations 
preceding  it  that  for  odd  sigmoids  with  strictly  decreasing  derivative 
on  some  infinite  interval  of  the  positive  real  axis,  the  functions  Ak  satisfying 
Condition  3.9  always  exist  and  are  in  fact  unique  when  rk  >  0.  Consequently,  it  is 
only  the  Growth  Condition  3.3  with  a  <  0  that  must  be  verified  when  dealing  with 
such  sigmoids.  /  /  /  / 

Theorem  3.2.  If  the  sigmoid  S  and  the  functions  Ak  (1  ^  k  <.  m)  in  the 
definition  of  ^  (Equation  3.4)  satisfy  Conditions  3.3,  3.9,  and  3.10,  then  the 
family  Fw  (w  e  G)  of  Theorem  3.1  interpolates  through  G  and 


fo 


for  k  =  0 


lim  Fw(xij)=  I 


■W— »oo 


Trk(yk-yk-i)v 


for  1  <  k  <  m  . 


Proof.  Since  Ak  (1  £.  k  <  m)  satisfy  the  Growth  Condition  3.3,  it  follows  from 
Theorem  3.1  that  F^  interpolates  through  Q  for  all  w  e  G. 

Now,  by  Equation  2.1  and  the  definitions  of  W.  P(w),  Lw.  and  a(w),  for  each  w  c 
G,F'w(xk)  is  given  by 


Fw (Xk)  =  Y A '(w) S'n,(Lv»,(xjj))wv  (0<k<m)  . 


(3.11) 
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Since  lim  A  kw)  =  -^I^  (Proposition  3.1),  it  suffices  to  show  that 


Km  S'n,  (Lv„(x,j))  w  = 


0 


TkCk 


for  k  =  0 


for  1  <  k  <  m  , 


(3.12) 


where  ejc  denotes  the  column  of  the  mxm  identity  matrix  Im-  Indeed,  if  the 
Limit  3.12  holds,  then  by  Equation  3.11, 


Hm  Fw(xk)  = 


W— >oo 


jrkYekV  =  jrk(yk-yk_i)v 


0 


for  1  <  k  <  m 


for  k  =  0  . 


To  establish  the  Limit  3.12,  consider  the  i^**  component  of  the  vector  S’ni(Lw(xk))w 
(1  <  i  <  m,  0  <  k  ^  ra) 


[S'n,(L^(xk))w]i  =  wj  S'(wiv(xk  -  Xj)  +  AjCwj))  . 


(3.13) 


When  i  ^  k,  v(xk  -  xi)  ^  0.  Thus,  Equations  3.10  and  3.13  imply 


Hm  [S'nj(Lvj,(xk))w]i  =  0  (0^ k^m,  1  ^i^m,i^k) 


When  i  =  k,  the  choice  of  and  Equations  3.9  and  3.13  imply 


W, 


lim  [S’n,(Lw(xk))w]k  =  Tk  ( 1  ^  k  $  m) 


The  last  two  limits  show  that  the  Limit  3.12  holds.  This  completes  the  proof  of 
the  theorem.  //// 
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By  setting  =  0  for  k  =  1,  2,  ....  m  in  Theorem  3.2,  we  arrive  at  one  of  the 
main  results  of  this  paper. 


Corollary  3.1.  (Bead  Interpolation  With  law  Sensitivity).  There  exists  a 
family  of  neural  networks  :  R"  (we  G)  such  that  each  interpolates 

through  G  and 


lim  Fi^(xit)  =  0 


W— 


for  0  <  k  <  m  . 


Proof.  The  examples  appearing  after  Condition  3.10  give  sigmoids  S  and 
functions  Ai^  for  every  ri^  ^  0  that  satisfy  the  hypothesis  of  Theorem  3.2.  Thus, 
the  corollary  follows  from  the  theorem  with  r^  =  0  (1  <  k<  m).  //// 

Another  result  that  follows  as  a  special  case  of  Theorem  3.2  when  n  =  ^  =  1 
states  that  the  values  of  a  one-input/one'Output  net  with  m  hidden  neurons  can 
be  exactly  specified  at  m  +  1  points  and  the  derivatives  at  m  of  those  points  can 
be  approximately  assigned  with  any  degree  of  accuracy  except  for  a  sign 
restriction. 

Theorem  3.3.  Let  (x^,  y^)  c  R2  (0<  k<  m)  be  m  +  1  points  such  that  x©  < 
X I  <  ...  <  Xm  and  let  d^  (0  ^  k  <  m)  be  m  -t-  I  real  numbers  satisfying  one  of  the  two 
conditions  below: 

(a)  do  =  0  and  for  k  >  0  d^  =  0  if  yk  -  yn  =  0;  otherwise,  dk(yk  -  yk-i)  >  0. 


(b)  dm  =  0  and  for  k  <  m  dk  =  0  if  yk+i  •  yk  =  0;  otherwise,  dk(yk+i  -  yk)  i  0- 


Then,  there  exists  a  family  of  neural  nets  F^  :  R  ->  R  (w  e  G)  with  m  hidden 
neurons  such  that 

^  (xk)  =  Yk  and  lim  (xk)  =  dk  (0  <  k  ^  m)  . 


Proof.  Assume  that  Condition  (a)  holds.  Set  Tk  =  0  if  yk  -  yk-i  =  0;  otherwise, 
2dk 

rk  =  ^  (1  ^  k  ^  m).  Let  S  denote  any  of  the  sigmoids  in  the  examples 

appearing  after  Condition  3.10  and  let  Ak  correspond  to  rk  (1  k  ^  m)  as  in 
TTieorem  3.2.  Since  rk  >  0  (1  <  k  <  m).  Theorem  3.2  applies  with  v  =  1  to  yield 
the  result. 
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If  Condition  (b)  holds,  set  and  (0  <.  k  ^m)  and  let  =  r^-k. 

2dk 

where  rk  =  0  if  yk+i  -  Vk  =  Oj  otherwise  .  rk  = -  (0  <  k  <  m-1).  Since  r.  >  0 

yk+i-yk  * 

(1  ^  k  <  m).  Theorem  3.2  applies  to  the  set  Q  =  {(xj^ ,  y^)  e  :  0  <  k  <  m}  with 
V  =  -1  and  we  obtain  :  R  ^  R  (w  e  G)  such  that  interpolates  through  Q  and 

0  for  k  =  0 

T  (yic  ~  yic-i)''  for  1  ^  k<  m  . 

(3.14) 

Now,  by  Equation  3.14  and  the  definitions  of  Xj^,  y^,  and  rj^,  we  have 


lim  ixj  = 


F;(xk)  =  Ui^  F;^(x;„_k)  =  -j  r;n_k(ym_k  -  y^-k-i) 

=  7rk(yk+i-yk)  =  dk 


(O^k^tn-1) 


and 


Bin  F;^(x^)  =  Um  F^(x;)  =  0  =  . 


W— >oo 


W— >«o 


This  completes  the  proof.  //// 

We  close  this  section  with  some  comments  about  the  last  result.  A  netwoik 
with  m  hidden  neurons,  one  input,  and  one  output  has  3m  +  1  degrees  of 
freedom;  namely,  the  components  of  the  m-vectors  a^,  w,  and  p  and  the  constant 
tto-  Theorem  3.3  exhibits  a  family  parametrized  by  vectors  w  belonging  to  the 
unbounded  open  set  G  in  R™.  Each  F^  interpolates  through  m  -*■  1  points.  This 
accounts  for  m  -«■  1  degrees  of  freedom.  The  parameter  w  accounts  for  m  degrees 
of  freedom.  The  remaining  m  degrees  of  freedom  were  utilized  to  approximately 
assign  the  derivatives  at  m  of  the  interpolation  points  within  the  restrictions  of 
Conditions  (a)  and  (b)  of  Theorem  3.3. 
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4.  APPROXIMATE  INTERPOLATION  WITH  LOW  SENSITIVITY, 
ALGORITHM,  AND  EXAMPLES 


An  interesting  consequence  of  Proposition  3.1  is  that 
Consequently,  =  ^  Y,  Thus,  one  may  be  tempted  to  replace  the  second 

matrix  of  weights  a(w)  =  Y  A‘l(w)  by  j  Y,  since  this  choice  of  weights  avoids 

having  to  compute  the  inverse  of  A(w).  With  this  choice  of  a  the  interpolation 
through  O  will  not  be  exact.  It  will  improve,  however,  as  w  increases.  In  this 
section,  this  idea  will  be  explored.  We  shall  derive  conditions  that  determine 
how  large  w  must  be  in  order  to  approximately  interpolate  through  Q  within  a 

given  error  tolerance  and  with  [F  (xk)lij  within  a  prescribed  distance  from  zero 
(0  <  k  <  m,  1  <  i  <  ^,  1  <  j  <  n)  using  a  =  j  Y. 

The  neural  network  map  with  a  =  j  Y  will  be  denoted  by  T,v .  It  can  be  written 
as 


Tw(x)  =  Yo  +  T  Y  [Sm(L^(x))  -  S^(Lw(xo))]  (x  6  R")  , 

^  (4.1) 


where  Lw  is  defined  in  terms  of  w,  v,  and  P(w)  as  in  Section  3  and  w  may  be  any 
vector  in  X. 

Our  first  lemma  gives  a  bound  on  the  size  of  the  error  Tw(xj)  -  yj  in  terms  of 
the  size  of  the  vectors  yj  (0  <  j  <  m).  The  absolute  value  of  a  real  number  z  will 
be  denoted  by  I  z  I  If  z  is  a  vector  with  components  z\,  Z2,  ....  then  I  z  1  = 

(I  zj  I,  I  Z2  I . Izifl]^.  If  z’  is  another  k-vector,  then  Izl  ^  I  z' I  means 

I  zi  I  S.  I  z'i  I  for  i  =  1,  2,  ...,  k. 


Remark  4.1.  Since  the  sigmoid  S  ;  R  (-1,  1)  is  continuous  with 

^^^S(t)=  ±  1. 

given  any  number  5  e  (0,  1)  one  can  find  a  >  0  large  enough  that 
1 -5<S(t)<  1  forallt>a.  //// 


Lemma  4.1.  Choose  5i  e(0,  1)  and  a  >  0  such  that  1  -  5}  <  S(t)  <  1  for  all  t  > 
a.  If  w  e  X  has  positive  components  and  satisfies  the  following  two  conditions 
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A^(wji)  >  a 

(1  <p  <  m) 

(4.2) 

W^v  (x^_i  -  x^)  +  A^(w^)  <  - 

a  (l<p<in) 

(4.3) 

|T^(xj)-yj|  ^l-Si  I  lyil 

/  1=0 

(4.4) 

This  lemma  is  proved  in  the  Appendix. 

The  next  lemma  gives  a  bound  on  the  size  of  (T^  (xk)]ij  terms  of  the 

component  of  v  and  the  size  of  the  i^^  component  of  the  differences  ly^  -  y^.i  I 
(1  <  p  <  m.  1  <  i  <  ^,  1  <  j  <  n). 


Lemma  4.2,  Choose  S2  >  0  and  assume  that  S  is  an  odd  differentiable 
function  with  S'  nonincreasing  on  (0,  <»).  If  the  neural  network  map  is  given  by 
Equation  4.1  and  if  w  =  [wj,  W2,  ....  WmJ^e  X  has  positive  components  and 
satisfies  the  following  two  conditions, 

0<  w^S'(A^(w^))  <  §2  (l<p<m)  ^4 


0  <  S'(w^  v(x^_i  -  x^)  +  (w^))  <  62  with 

V  (x^_i  -  x^)  +  A^(w^)  <0  (1  iS  p  <  m) 


(4.6) 


where  (y^  -  y^i-i)i  denotes  the  i‘**  component  of  the  vector  (y^  -  yji-i),  p  =  1,  2 . 

m  and  [T^(xk)lij  is  the  ij^**  entry  of  the  matrix  T^(xk). 
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The  proof  of  this  lemma  may  be  found  in  the  Appendix. 

Lemma  4.1  establishes  the  connection  between  the  asymptotic  values  of  the 

sigmoid  and  the  size  of  the  errors  in  the  approximate  interpolation.  It  shows  that 

the  errors  can  be  made  arbitrarily  small  by  choosing  w  large  enough,  as  specified 
by  Inequalities  4.2  and  4.3.  Its  counterpart.  Lemma  4.2,  shows  the  connection 
between  the  asymptotic  values  of  the  functions  fa,n(t)  =  t  S’(at  +  A^(t))  (t  >  tji,  1  < 
h  ^  m,  a  e  R)  and  the  size  of  the  derivatives  at  the  points  of  interpolation.  It 
shows  that  the  derivatives  can  be  made  arbitrarily  small  by  choosing  w  large 
enough  as  speciHed  by  Inequalities  4.S  and  4.6. 

The  next  theorem,  which  is  the  main  result  of  this  section,  puts  together 
these  results  in  a  proof  showing  that  when  the  functions  Ak  •’  (tk.  -*  (0.  ~) 
approach  infinity  as  w  ->  <»  in  the  particular  way  described  in  Section  3,  then  one 

can  in  fact  find  a  vector  w  e  X  such  that  the  errors  and  the  derivatives  at  the 

interpolation  points  are  arbitrarily  small  for  all  w^w.  Before  stating  the 
theorem,  it  should  be  emphasized  that  the  two  lemmas  above  hold  true  if  the 
functions  Ak  :  (tk,  *>)  -*  (0,  »<>)  (1  <  k  <  m)  arc  constant  functions.  That  is,  the 
values  A^(w^)  appearing  in  the  Inequalities  4.2,  4.3,  4.S,  and  4.6  can  be  fixed 
constants  independent  of  Wji(l  <4^  m)  without  invalidating  the  proofs  of  the 
two  lemmas.  Notice,  however,  that  Inequality  4.5  cannot  hold  for  all  w  >  w 
unless  S'CAjiCw^))  decreases  as  w^  increases  without  a  bound,  forcing  Ajifw^)  to 
vary  with  w^.  Since  the  main  purpose  of  these  two  lemmas  is  to  facilitate  the 
proof  of  Theorem  4.1,  which  requires  Inequality  4.5  to  hold  for  all  w  we 

chose  to  state  the  lemmas  in  a  manner  that  indicates  the  possibility  that  A|x(w^) 
may  vary  with  wji  (1  <  4  <  m). 

Theorem  4.1.  Approximate  Interpolation  With  Inw  Sensitivity). 
Assume  that  S  is  an  odd  differentiable  function  with  S'  nonincreasing  on  (0,  <»). 
Let  Ak  :  (tk, «»)  (0,  00)  satisfy  the  Growth  Condition  3.3,  Condition  3.10,  and 

Condition  3.9  with  rk  =  0  (1  S.  k  <  m).  Let  Tw  :  R"  ->  R^  be  given  by  Equation  4.1. 
Then,  for  any  ei  >  0  and  £2  >  0,  there  exists  iv  e  X  such  that  for  all  w  >  iv, 

'  l(TJXj)-yj)il  <  ei  (l<i<A0<j^m) 

and 

|[T;(xk)]ijl  <  £2  (1  l^j^n,0^k<m)  , 

where  (T^fxj)  -  yj),  denotes  the  i^**  component  of  Tw(xj)  -  yj. 


Proof.  Let  yji  denote  the  i^*'  component  of  yj  (1  i  i  0  ^  j  <  m).  Let  61  c 
(0,  1)  and  S2  >  0  satisfy 
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3  "'ll 

■j  jj  lyj'l  ^  (1 

Ji  Ivjl  <  £2 


1  <j<n) 


(4.8) 


(4.9) 


and  choose  a  >  0  as  in  Lemma  4.1.  Then,  clearly,  by  Lemmas  4.1  and  4.2,  it 
suffices  to  show  that  there  exists  w  e  X  such  that  Inequalities  4.2,  4.3,  4.5,  and 
4.6  hold  for  all  w  >  w. 

Fix  M.  e  {1,  2,  ...,  m).  The  Growth  Condition  3.3  clearly  implies  that  there 
exists  w >  t^  such  that  Inequalities  4.2  and  4.3  hold  for  all  w^  >  w  Similarly, 
Condition  3.9  with  r^  =  0  implies  that  there  exists  w"^  >  t^  such  that  Inequality 
4.5  holds  for  all  w^  >  w"^.  Finally,  the  Growth  Condition  3.3  and  Condition  3.10 
imply  that  there  exists  w”  ^  >  t^  such  that  Inequality  4.6  holds  for  all  w^  >  w" 

By  letting  =  max  {w^  ,  w’^,  w"^}  for  each  pe  {1,  2 . m),  we  obtain  =  [wj, 

'•’2>  — .  Wm]^  6  X  with  the  required  properties.  //// 

The  functions  A^  (1  m)  in  Theorem  4.1  have  in  common  that  they  all 

satisfy  Condition  3,9  with  r^  =  0;  namely,  lim  tS’(AH(t))  =  0  (1  <  p  <  m),  and  they 
satisfy  the  Growth  Condition  3.3  with  a  <  0;  namely. 


lim  (at  + Ai.(t)]=J 


+  oo  if  a  =  0 


—  oo  if  a  <  0 


For  some  sigmoids  there  are  several  choices  of  functions  that  satisfy  the  above 
two  limits  with  different  rates  of  convergence,  and  in  some  applications  it  may  be 
advantageous  to  select  different  functions  A^  in  order  to  satisfy 
Inequalities  4.2,  4.3,  4.5,  and  4.6  with  smaller  values  of  W(i  (1  <  p  <  m).  The  last 
two  lemmas  and  the  theorem  were  stated  with  sufficient  generality  to 
accommodate  different  functions  A^.  If,  however,  the  functions  A^  (l^p<  m) 
are  all  the  same  function,  then  the  conditions  of  the  lemmas  can  be  simplified. 
Before  closing  this  section,  we  present  these  simplifications  and  briefly  discuss 
qualitatively  when  and  why  one  would  choose  functions  A^  with  different  rates  of 
convergence  in  the  two  limits  above. 

Proposition  4,1.  Let  S  be  an  odd  differentiable  function  such  that  on  (0, 
oo).  S'  is  nonincreasing  and  positive.  Assume  that  A^  =  A  for  all  p  =  1,  2,  ...,  m. 
Let 
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a 


min 

1 


v(Xj,-X^_i)  • 


If  w  >  0  satisfies  the  following  three  inequalities 
A(w)  >  a 

- 1"  w  +  A(w)  <  0 
0  <  w  S'(A(w))  <  §2 


(4.10) 

(4.11) 

(4.12) 


and  =  w  for  all  ^  =  1,  2,  ....  m,  then  Inequalities  4.2,  4.3,  4.5,  and  4.6  hold  for 
all  (1  =  1,  2,  ....  m.  Here  a  and  S2  are  as  in  Lemmas  4.1  and  4.2,  respectively. 

Proof.  Since  A^  =  A  and  Wji  =  w  (1  m),  clearly  Inequality  4.10  implies 

Inequality  4.2  and  Inequality  4.12  implies  Inequality  4.5.  Inequalities  4.10  and 
4.11  together  imply  -aw  +  A(w)  <  -A(w)  <  -a,  which  implies  Inequality  4.3  for  1 
<  p  <.  m  by  definition  of  a.  Now,  the  next  series  of  inequalities  follows  from  the 
definition  of  a.  S'  >  0  and  nondecreasing  on  (-<»,  0),  Inequality  4.11,  S'  even,  and 
Inequality  4.12: 


0  <  S'(w^  v(x^_l  -  Xji)  +  A^(Wjj))  <  w  S'(-  wa  +  A(w))  <  w  S'(-  A(w)) 

=  w  S'(A(w))  <  82  . 


Therefore,  Inequality  4.6  holds  for  all  p  =  1,  2,  ....  m,  //// 

In  Proposition  4.1  we  are  simply  taking  advantage  of  the  fact  that  once 
Inequality  4.3  is  satisfied  for  that  p  that  gives  the  smallest  value  of  v(x^  -  x^-i), 
then  the  same  value  w^  satisfies  Inequality  4.3  for  all  the  other  values  of  p. 
However,  a  very  small  value  of  v(x^  -  x^-i)  may  require  an  extremely  large  value  of 
w^  to  satisfy  Inequality  4.3,  while  the  same  inequality  may  be  satisfied  by  more 
conservative  values  of  w^  for  the  other  values  of  p. 

In  cases  where  a  large  discrepancy  exists  between  the  terms  v(x^  -  x^-i)  (1 
p  m),  it  may  be  better  to  satisfy  each  of  the  Inequalities  4.3  with  different 
values  for  w^.  Moreover,  since  the  terms  w^v(x^-i  -  x^)  and  A^(w^)  in  Inequality 
4.3  are  competing  against  each  other  in  the  sense  that  A^(w^)  is  increasing 
with  w^  while  w^v(x^-i  -  x^)  is  decreasing  linearly  with  w^,  it  may  be 
advantageous  to  choose  functions  A^  with  different  rates  of  divergence 
depending  on  the  sizes  of  the  terms  v(x^-i  -  x^)  (l^p^  m).  Qualitatively 
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speaking,  a  slow  rate  of  divergence  for  implies  a  smaller  value  for  i  n 
Inequality  4.3,  but  it  also  implies  a  larger  value  for  w^  in  Inequality  4.2.  A  faster 
rate  of  divergence  for  Ajx^  of  course,  would  imply  the  opposite. 

Finally,  we  should  point  out  that  the  sizes  of  the  terms  v(x^-l  -  Xji)  (1  <  4<  m) 
also  depend  on  the  choice  of  v.  How  to  choose  v*^  and  the  functions  A^  optimally 
will  not  be  discussed  here.  These  are  issues  that  require  further  research.  We 
do  believe,  however,  that,  as  a  general  rule,  the  faster  S  converges  to  I,  the 
slower  A^  will  grow  and  the  smaller  the  weights  will  be. 

The  following  simple  examples  illustrate  some  of  the  points  mentioned 
above.  Hopefully,  they  also  will  help  the  reader  appreciate  the  simplicity  of  the 
technique  for  determining  weights  that  follows  from  Theorem  4.1. 

Throughout  these  examples  the  sigmoid  will  be  the  hyperbolic  tangent;  S(t)  = 
tanh(t)  with  derivative  S'(t)  =  sech2(i)  (t  e  R).  It  is  not  hard  to  show  that,  for  any 
Tj  >  0.  the  function 


satisfies 


for  all  t>  1. 


m 

The  set  X  cRm  isX  =  .n  (i,  o.).  The  functions  At|  and  S  are  strictly  increasing.  The 

algorithm  that  we  chall  use  is  based  on  Proposition  4.1.  For  each  a  >  0  and  q  >  0, 
let  ga,Ti  denote  the  function  appearing  on  the  left-hand  side  of  Inequality  4.11; 
that  is 


8a,Ti  (l)  ~  2  ^  ^ 


(t>l)  . 


Let  t*(a)  denote  the  value  of  t  where  ga,T|  achieves  its  maximum  value.  When 
q  =  1  or  q  =3.  one  can  find  a  closed-form  expression  for  t*(a): 


(a>0) 


(4.13) 
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% 

Since  ga.n  is  decreasing  on  [t^(a)  .  “>).  -ve  shall  look  in  this  interval  for  a  value  of  w 

that  satisfies  Inequality  4.11.  Note  that  since  wS’(At^(w))  .  Inequality  4.12  is 

satisfied  by  all  w  >  (1/62)^^^.  Similarly,  since  A-^  is  an  increasing  function. 
Inequality  4.10  is  satisfied  by  all  w  >  where  A^^'i  denotes  the  inverse  of 

the  function  At|.  Thus,  the  strategy  will  be  to  find  w  ^  max  {(l/Sa)^^^. 

* 

t^(a)}  that  satisfies  Inequality  4.11.  The  inverse  of  At^  is  given  by 

2 

(a)  =  [cosh  (a)]  i+^i  (a  >  0)  . 

Note  that  when  a  is  small,  one  can  use  the  following  approximations; 
l*(a)  «  7  and  t  *(a)  a  7  ,  for  "small  a"  (0  <  a  <  1). 

1  a  3  a 


The  Problem.  Given  Xq,  Xi,  ...,  x^  in  R",  yo,  y\ . ym  in  R^,  ej  >  0,  and  £2  > 

0,  find  w  e  X  such  that 

I  [T^  (xj)  -  yjli  I  <  El  (1  ^  i  <  A  0  <  j  ^  m) 

and 

1  IT^  (xi5)lij  I  < £2  (1  < i  <^,  1  <  j  <  n,  0  < k  <  m) 

for  all  w  >  w. 
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Step  2 . 

2.1.  Choose  in  R"  at  random. 

2.2.  Order  the  numbers  vxk(0<k<  m)  and  relabel; 

VXk^  <  VXkj  <  ...  <  VXk^  . 

2.3.  Compute  the  consecutive  differences  of  the  above  numbers: 
aji  ■  v(xk^  -  Xk^  j)  (1  <  p  <  m)  and  order  the  numbers  a^; 

0  <  a^,  <  a^2  ^  ...  <  . 

If  a^j  =  0,  repeat  Step  2. 

Step  3.  (For  Derivative  Control) 

3.1.  Compute 

M2=  max  I  Vj  I  (yk^-  |  . 

ime 

li  jin 

3.2,  Choose  52  >  0  such  that  62  <  2  E2  M2^ 

Set  i  =  1. 

Step  4. 

4.1.  Set  a  =  a^.. 

4.2.  Choose  t|  >  0  for  appropriate  decay  rate  of  tS'(AT|(t))  s  —  -  (Note:  Small 

.s  a  calls  for  low  rate  of  increase  of  At^,  thus  low  rate  of  decay  of  ^  ;  i.e., 

small  T].) 

4.3 .  Set  A  =  A|^. 

4.4.  Let  ta»  A**  (a) ,  tSj  =  ;  if  ta^  is  too  large,  increase  i\. 

4.5.  Let  t*  =  value  of  t  where  (-  j  t  A(t)l  attains  its  maximum  value. 

4.6.  Set  /  =  max  {to.  tjj,  t*}. 
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(Note:  The  value  /  has  the  property  that  Inequalities  4.10  and  4.12 
hold  for  all  w  >  /  and  the  function  [-^  t  +  A(t)]  is  decreasing  for 
t  >  t.) 

4.7.  If  -  ^  a/  +  A(t)  >  0,  let  t  =  (2A(/)  +  q)/a  .  let  /  =  t,  and  repeat  Step  4.6. 

(Note:  Here  q  >  0  and  q  «  1.  The  larger  q  is,  the  faster  the 

convergence  to  a  value  t  satisfying  -  ^  a/  +  A(/)  <  0;  however,  too 
large  a  q  can  lead  to  an  excessively  large  /.) 

4.8.  If  -  j  a/  +  A(0  <  0,  set  =  t. 

4.9.  If  i  =  m,  stop. 

4.10.  If  is  not  too  large,  set  for  all  j  >  i.  Stop. 

4.11.  Set  i  =  i  +  1.  Repeat  Step  4. 

Remark  4.2.  Since  the  function  g(t)  s  -  j  a/  +  A(/)  is  decreasing  on  the 
interval  (t*,  «»)  and  lim  g(t)  =  -  <»  when  a  >  0,  it  is  easy  to  see  that  the  iteration  in 

t— »«• 

Step  4.6  will  yield  a  value  t  such  that  g(/)  <  0  in  a  finite  number  of  steps 
whenever  q  >  0.  To  see  this,  assume  g(t*)  ^  0  and  let  T  >  t*  satisfy  g(T)  =  0.  Set 
=  (2A(tk)  +q)/a  (k  =  0,  1,2,  ...),  where  t^  is  any  point  in  [t*,  T].  Now,  if  t^  e 

q 

[t*,  T],  then  g(t|.)  >  0;  thus,  t^^j  >  t^  +“  .  Therefore,  as  long  as  to  and  t^  e  [t*,  T], 

a 

q 

we  have  t^  >  t©  +  k  —  .  This  means  that  t^  cannot  be  less  than  T  for  all  k  >  0.  Thus, 
after  a  finite  number  of  iterations,  t^  leaves  the  interval  [t*,  T]  and  g(t|c)  <0.  till 

In  the  following  examples,  t)  will  be  either  1  or  3  so  that  we  may  determine 

* 

t^(a)  from  Equations  4.13. 

Example  4.1a.  The  points  of  interpolation  are  {(0,  0),  (1,  1),  (1.1,  -1),  (2, 
0)}.  So  let  xo=  0,  xi  =  1,  X2  =  1.1,  X3  =  2,  and  yo  =  0,  yi  =  1,  yi  =  -1.  ys  =  0.  Let  ei  = 
£2  =  0.001. 
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Step  1. 

Ml  =  2,  8,  <  I*-  =  0.00033 

8i  =  0.0003 
a  =  S-»  (1  -  8,)  =  4.402 


Step  2. 

Since  xo  <  xi  <  X2  <  X3,  let  v  =  1,  so  kji  =  ji  (1  <  n  <  3) 
ai  =  xi  -  xo  =  1 

32  =  X2  -  Xi  =  0.1 

83  =  X3  -  X2  =  0.9 

Since  32  <  83  <  3i,  we  have  =2,  42  =  3,  43  =  1. 

Step  3. 

M2  =  I  yi  -  yo  I  +  I  ya  -  yi  1  +  I  ys  -  ya  I  =  1  +  2  +  1  =  4 

82  <  262/4  =  0.0005 
82  =  0.0004 
Ut  i  =  1. 

Step  4. 

a  “  “  0.1 

Since  a  is  "small,"  choose  "small"  Tf. 

Let  T\  =  1 

A(t)  =  Ai(t)  =  cosh-l(t)  (t  >  1) 

A"l(o)  =  cosh(a).  (a  ^  0) 

to  =  cosh  (o)  =  cosh  (4.402)  =  40.81 

t^=  =  2500.  This  value  of  tSj  ^cessively  large.  We  must  increase  t). 

Let  T)  =  3 

A(t)  =  A3(t)  =  cosh*l(t2) 

A'l(a)  =  Vcosh(a) 

to  ='Vcosh(a)  =  6.38  [smaller  value  of  to  means  that  A(t)  is  increasing  faster] 
tgj  =  (1/82)*^^  =  N25OO  =  13.57.  This  value  of  tgj  is  acceptable. 

t*  =  t3  (a)  a  4/a  =  40 
/=  40. 

The  following  table  shows  the  results  of  the  iterations  involved  in  Step  4.6. 
We  shall  use  q  *  1  and  g(t)  =  *  l  +  A(t). 
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t 

-I' 

A(/) 

g(f ) 

i  new  “  20  A(/  )  +  10 

-2 

6.07 

171.4 

171.4 

-8.57 

2.41 

229.6 

229.6 

-11.48 

■■ 

-0.09 

//// 

Let  =  ivj  =  230. 

Since  a2  is  smaller  than  all  the  other  a|x^,  the  value  of  h'2  will  work  for  all  of  the 

other  weights;  however,  we  consider  W2  too  large,  so  we  will  repeat  Step  4 
with  i  =  2. 

Step  4  with  i  -  2. 

a  =  a^2  =  aj  =  0.9 

Set  n  =  3  (in  order  to  meet  the  derivative  requirement  with  an  acceptable 
value  of  Wj) 

t(x  and  tS2  are  as  before  (because  t\  did  not  change) 

t*  =  t;(a)«j  =  4.45 

t  =  13.57  and  g(t )  <  0. 

Let  Wfi^  =  w  3  =  13.57. 

Note  that  since  t  =  tS2>  it  is  the  derivative  requirement  that  will  determine  all 
of  the  remaining  weights  (i.e.,  Wj)  even  if  the  remaining  a^.  are  much  larger  than 
as- 


Let  =  W3  =  13.57.  Stop. 

The  vector  w  «  [13.57,  230,  13.57]  satisfies  Theorem  4.1  for  the  data  of  this 
example. 

To  complete  the  example,  we  shall  determine  a  neural  net  mapping  T^  :  R 
R  that  interpolates  through  the  data  with  an  error  less  than  ei  =  0.001  and  with 
derivative  less  than  t2=  0.001  at  the  interpolation  points.  We  shall  use  w  =  iv. 

Recall  that  Tw(5i)  =  yo  [Sin(Lw(x))  -  Sni(Lw(xo))]>  where 


L^(x)  = 


Wj(x  -  Xj) -t- A(wi) 
W2(x  -  X2)  A(w2) 

W3(x  -  X3)  +  A(w3)J 


.  Y  =  lyi-yo  •  y2-yi  •  y3-y2l 
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Note  that  the  function  A  in  the  i^  component  of  Lw{x)  is  the  function  used  in  the 
computation  of  w\  (1  <  i^  3).  The  result  is 


TJx)  =  -1-[S(13.57x  -  7.67)-  2S(230x  -  241.43)+  S(l3.57x  -  21.24)1 
*  2 


//// 


Example  4.1b.  This  example  is  the  same  as  Example  4.1a.  We  want  to 
show  that,  by  working  directly  with  Inequalities  4.3  and  4.6  instead  of  the 
shortcut  presented  in  Proposition  4.1,  Theorem  4.1  may  be  satisfied  with  smaller 
weights.  We  focus  on  the  second  weight  W2.  A  simple  calculation  shows  that  W2  = 
185  satisfies  Inequalities  4.2,  4.3,  4.5,  and  4.6  with  A^(t)  =  cosh'^Ct^),  i.e.,  q  =  3. 
Moreover,  w  =  [13.57,  185,  13.57]  satisfies  Theorem  4.1.  Note  that  h'2  is  smaller 
than  in  Example  4.1a.  till 

Example  4.1c.  Now  let  us  consider  Example  4.1a  without  the  requirement 
on  the  derivative.  Recall  that  in  Step  4  we  were  forced  to  increase  q  from  1  to  3 
in  order  to  satisfy  the  requirement  on  the  derivative.  Without  this  requirement, 
we  can  use  q  »  1  to  solve  the  interpolation  problem  with  a  smaller  weight  W2. 
Again,  we  only  focus  on  the  second  weight  and  the  iterations  involved  in  Step  4.6 
with  q  =  1  and 

A(t)  =  Ai(t)  =  cosh'^t) 
o  =  4.402 
ta  =  40.81 

t*(a)  a  2/a  =  20 

t  =  41. 


t 

-r/2  0 

A(/) 

g(0 

/new  =  20  A(/)  +  10 

4  1 

-2.05 

4.406 

2.36 

98.12 

98.12 

-4.91 

5.279 

0.37 

1 15.6 

115.6 

-5,78 

5.44 

-0.33 

//// 

The  interpolation  problem  can  be  solved  with  W2  =  116.  Moreover,  if  we  work 
directly  with  Inequality  4.3,  we  find  that  W2  =  100  also  will  solve  the  interpolation 
problem.  //// 

Example  4.2.  The  six  inputs  of  this  example  belong  to  R^;  xq  through  X5 
are,  respectively,  [0  0  Of,  [0  1  0]T.  [1  0  OF.  [1  1  OF.  [0  0  O.lF.  and 
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[0  0  1]T.  The  outputs  yo  through  ys  belong  to  R^;  they  are  [0  OF,  [1  OF.  [0 
1]^.  [0  OF.  [1  1]^.  and  [0  We  wish  to  interpolate  through  the  points  (xi, 

yi).  0<  i<  5  with  an  error  bound  Ej  =  0.001  and  a  derivative  less  than  E2  =  0.01 
at  the  interpolation  points. 

Step  1. 


'It 

M,  =  max  {2,  5}  =5,  8i  <Yy  =  0.00013 

6i  =  0.0001 

a  =  S-l(l  -  5,)  =  4.952. 

Step  2. 

Let  V  =  [2  1  -1]  to  get 

vxo  =  0,  vxi  =  1,  VX2  =  2,  VX3  =  3,  VX4  =  -0.1,  VX5  =  -1. 

Since  vxs  <  VX4  <  vxq  <  vxj  <  vx2  <  vxj,  we  have 
ko  =  5.  ki  =  4,  k2  =  0.  kj  =  1,  k4  =  2.  ks  =  3. 
ai  =  v(xkj  -  Xk(,)  =  VX4  -  VX5  =  -0.1  -  (-1)  =  0.9 
a2  =  v(xk2  ■  =  vxo  -  VX4  =  0  -  (-0.1)  =  0.1 

as  =  v(xk3  *  *^2)  =  vxj  -  vxo  =  1  -  0  =  1 

a4  =  v(xk4  *  =  VX2  -  vx,  =  2  -  1  =  1 

as  =  v(xicj  -  Xk^)  =  VX3  -  VX2  =  3  -  2  =  1 

Since  a2  <  ai  <  a3  =  34  =  as  we  have 
Pi  =  2,  P2  =  1.  Its  =  3,  P4  =  4.  Ps  =  5. 

Step  3. 

Consider  the  matrix  Ms  I  ^  I  I  v  | 

M=  [  I  ya-ys  I  +  I  yp-ya  I  +  I  yi-yo  I  +  I  y2-yi  I  +  I  y3-y2  0[  21 1] 


2  e2 

By  inspection,  we  get  M2  =  10,  so  Sj  <  ^  ^  ^  =  0.002. 

Let  62  =  0.001 
Let  i  =  1. 
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Step  4. 
a  =  ajij  =  aa  =  0.1 

Let  Ti  =  3,  A(t)  =  A3(t)  =  cosh'l  (t^),  A'^  (a)  =  V  cosh(a) 
t„  =  8.41 

152  =  =  10 

t*  =  (3*  (a)  =  40 

N'a  =  230  (see  Example  4.1a.) 

Ut  i  =  2. 

Step  4. 

a  =  a^2  =  ai  =  0.9 
Ti  =3 

ta  =  8.41,  tgj  =  10  (as  before) 
t*  =  t3(a)a^  =  4.45 
t  =  max  {t<*,  tjj,  t*}  =  tSj  =  10 


t 

-0.45^ 

A(/) 

g(0 

/new  =  2.22  A(/)  +1.11 

10 

-4.5 

5.29 

0.8 

12.85 

12.85 

-5.78 

5.8 

0.015 

13.98 

13.98 

-6.29 

5.97 

-0.32 

HU 

Wi  =  14 
Let  i  =  3. 

Step  4. 


a  =  a^3  =  as  =  1 
t1=3 

ta  =  8.41  and  t^j  ==  10  (as  before) 

t*  =  4(a)«7  =  4 
t  =  10 
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t 

-0.5/ 

A(/) 

g(^) 

^new  -  2  A(/)  +  1 

1  0 

-5.0 

5.29 

0.29 

1  1.58 

1 1.58 

-5.79 

5.59 

-0.2 

//// 

Set  M's  =  11.6  and  all  the  rest  of  the  weights  =  11.6 

=  11.6 
W5  =  11.6 

w  =  [14,  230,  11.6,  11.6,  11.6]T.  //// 


The  last  example  illustrates  the  modular  features  of  the  technique  and  shows 
how  to  estimate  the  error  directly  from  Lemma  4.1. 

Example  4.3.  Consider  the  "exclusive  or"  problem;  that  is,  a  map  that 

interpolates  through  (xi.  yi),  i  =  0.  1,  2,  3,  where  xq  =  [0  OF,  Xj  =  [0  IF.  *2  = 

[1  0]T,  X3  =  [1  IF,  yo  =  0  =  ys.  and  yi  =  y2  =  1-  We  can  assemble  a  networic  that 
implements  the  exclusive  or  problem  using  "part"  of  the  network  in  Example  4.2. 

Note  that  the  matrix  v  =  [2  1]  maps  the  vector  xi  into  i  for  each  i  =  0,  1,  2,  3; 
note  that  the  matrix  v  of  the  previous  example  achieved  the  same  result.  Thus, 
we  already  have  a  network  (with  three  hidden  units)  in  the  previous  network  that 
maps  the  integers  0,  1,  2,  3  into  desired  outputs  yo,  yi,  yi,  y^.  Consequently,  all 
we  need  to  do  is  to  use  the  correct  matrix  Y.  If  v  =  [2  1],  w  =  [11.6,  11.6,  11. 6F. 
and  Y  =  [1  0  -1],  we  have  a  net  that  implements  the  exclusive  or  problem. 

Let  us  use  Lemma  4.1  to  estimate  the  error.  Since  wi  =  11.6  and  A(wi)  = 

5.59,  we  know  that  Inequalities  4.2  and  4.3  hold  with  a  =5.58.  Since  1  -  S(a)  = 

0.000028,  we  conclude  that  Inequality  4.4  holds  with  5]  =  0.00003.  By  Lemma 


4.1, 

by 


the  error  is  bounded 


0.00009.  The  mapping  is  given 


T(x)  =^  [S(11.6  vx  -  6.01)  -  S(11.6  vx  -  29.21)]  -  0.000005  (x  e  R2), 


with  V  =  [2  1]. 


T(xo)  =  0,  T(xi)  =  0.99998  =  T(x2).  TCxj)  =  0.000009.  //// 

Remark  4.3.  The  reader  might  have  noticed  that  when  the  value  of  r\  is 
determined  and  fixed  by  the  requirements  on  the  derivative  at  the  interpolation 
points,  as  it  was  the  case  in  some  of  the  examples  above,  then  the  algorithm  is 
more  efficient  if  Steps  4.2  through  4.4  are  performed  immediately  after  Step 
3.2,  for  then  those  steps  are  performed  only  once.  //// 
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APPENDIX 


Proof  of  Lemma  3.1.  Let  denote  the  unique  line  through  xi  and  xj  and 
let  Hij  be  the  hyperplane  through  the  origin  consisting  of  all  vectors  in  R"  that 
are  perpendicular  to  for  0  <  i  <  m,  0  ^  j  <  m,  and  i  /  j.  Let  H  s  U  {Hij  :i^j,0< 
i  ^  m,  0  <  j  <  m}.  If  H,  then  (vxi  :  0  <  i  <  m)  is  a  set  of  distinct  numbers;  for 
if  not,  say  vxi  =  vxj,  then  v(xi  -  xj)  =  0,  which  implies  that  v’r  is  perpendicular  to 
the  line  ^ij;  thus,  e  H,  a  contradiction.  /  /  /  / 

Remark  A. I.  Since  the  set  H  has  Lebesgue  measure  zero,  it  follows  that  all 
vectors  v^  in  R"  satisfy  the  hypothesis  of  the  lemma  except  for  those  on  a  set  of 
measure  zero.  /  /  /  / 

Proof  of  Lemma  3.2.  We  must  show  that  for  every  r^  ^  0 


lim  Ai.  (t)  =  <» 

(A-1) 


and 


lim  t  S'(at  +  AUt))  =  0  for  all  a  0  , 

t-»«o 


(A-2) 


where  Ak  satisfies  Condition  3.9  and  Condition  3.3  with  a  <  0. 

Fix  rk  ^  0  and  consider  Ak  :  (tk,  “)  -»  (0,  «>«»).  If  Equation  A-1  does  not  hold, 
there  exists  M  >  0  and  an  unbounded  sequence  m  <  P2  <  —  <  Itn  <  —  such  that 
Ak  (Pn)  ^  M  for  all  n  >  1.  Since  S'  is  nonincreasing  on  (0,  o®),  we  have  S'(Ak(Pn))  i 
S'(M)  for  all  n  >  1.  Consequently, 


S’(A,j(m))  S  m,  S'(M)  oo  as  n  oo  , 


which  contradicts  Condition  3.9.  Therefoir,  Equation  A-1  holds. 


To  prove  Equation  A-2,  first  assume  that  rk  =  0.  Since  S'  is  nonincreasing  on 
(0,  «»),  S’(at  +  Ak(t))  ^  S'(Ak(t))  when  a  >  0  and  t  >  max  {0,  tk).  Thus,  when  a  >  0, 
Equation  A-2  trivially  follows  from  Condition  3.9  with  rk  =  0.  When  a  <  0, 
Condition  3.3  gives 


lim  4 

t-*oo  L  2  *  J 


PO 
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Certainly,  then,  there  exists  T  >  tk  such  that  j  at  +  Ak(t)  <  0  for  all  t  >  T;  that  is, 

-  ta  -  Ak(t)  >  Ak(t)  >  0  for  all  t  >  T.  Since  S'  is  an  even  function  and  nonincreasing 
on  (0,  oo),  it  follows  from  the  last  inequality  that 

tS’(ta  +  Ak(t))  =  t  S'(-ta  -  Ak(t))  <  tS’(Ak(t))  for  all  t  >  T. 

Hence,  when  a  <  0,  Equation  A-2  also  follows  from  Condition  3.9  with  r^  =  0. 

Next,  fix  ric  >  0.  To  establish  Equation  A-2,  we  shall  show  that  to  every  e  >  0 
there  corresponds  a  T  ^  tk  such  that  tS’(ta  +  Ak(t))  <  e  for  all  t  >  T. 

e 

If  r©  s  2'  hypothesis  of  the  lemma  gives  us  a  function  A©  :  (t©,  oo)  (0,  «) 
such  that 


tS'(Ao(t))  =  -7  for  all  t>t^  . 

2  (A-3) 

If  a  >  0,  Condition  3.3  applied  to  A©  shows  that  there  exists  T©  ^  t©  such  that 
t(-a)  +  A©(t)  <  0  or.  equivalently,  ta  >  A©(t)  for  all  t  >  T©.  Since  S'  is 
nonincreasing  on  <0.  «>)  and  Ak  is  positive  valued,  it  follows  from  the  last 
inequality  that 

S'(ta  +  Ajj(t))  ^  S'(Ao(t))  for  all  t  >  max  {Tq,  . 

Let  T  »  max  {T©,  tk}.  Equation  A-3  and  the  last  inequality  imply  tS'(ta  +Ak(t))  < 
£ 

e  for  all  t  >  T.  This  proves  Equation  A-2  for  a  >  0  and  r^  >  0. 

Now  assume  that  a  <  0.  Condition  3.3  applied  to  Ak  and  A©  shows  that  there 
exist  Ti  >  tk  and  T2  >  t©  such  that 


j  at  +  Ak(t)  <  0 

for  all 

t  >Ti 

j  at  +  A©(t)  <  0 

for  all 

t  >T; 

Consequently,  if  T  ■  max  {Ti,  T2),  then  -  ta  -  Ak(t)  >  A©(t)  >0  for  all  t  >  T.  And, 
as  before,  since  S'  is  an  even  function  and  nonincreasing  on  (0,  <»),  we  have 

tS'(at  +  Ak(t))  =  tS'(-at  -  Ak(t))  <  tS'(A©(t))  =j<t  for  t  >  T. 
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Therefore,  Equation  A-2  holds  when  a  <  0  and  fk  >  0.  This  completes  the  proof  of 
Lemma  3.2,  //// 

Remark  A-2.  For  the  purpose  of  this  remark,  denote  by  Ar  a  function  Ar  : 
(tf,  oo)  (0,  oo)  that  satisfies 


t  S’(Ar(t))  a  r  for  all  t  >  tr  .  (A-4) 

where  r  >  0,  and  let  Aq  :  (to,  oo^  — >  (0  ^  )  satisfy 
lim  tS'(Ao(t))  =  0  . 


It  is  easy  to  see  that  the  functions  Ar  above  are  nondecreasing  for  all  r  >  0 
whenever  S'  is  nonincreasing  on  (0,  <»>).  This  fact  was  not  needed  in  the 
development  of  the  theory  in  Section  3,'  however,  it  may  prove  useful  when 
implementing  the  techniques  presented  in  this  paper.  To  see  that  Ar  i  s 
nondecreasing  when  r  >  0  assume  otherwise;  assume  there  exist  a  <  b,  both  in 
(tr,  «»),  such  that  Ar(a)  >  Ar(b).  Then,  S’(Ar(a))  <L  S’(Ar(b)),  which  implies 
r  =  a  S'(Ar(a))  <  a  S'(Ar(b))  <  b  S'(Ar(b))  *  r,  a  contradiction. 

The  function  Aq  can  be  defined  in  such  a  way  that  it  too  is  a  nondecreasing 
function,  provided  tr  does  not  increase  as  r  decreases.  This  can  be  done  as 
follows.  Suppose  that,  for  0  <  r  ^  Tq,  tr  does  not  increase  as  r  decreases.  Let 

f  '•  (lr„>  oo)  (0,  ro)  be  a  decreasing  function  such  that  lim  f(t)  =  0.  Define 

Ao  •'  (tfo.  ~)  -*  (0*  “»)  by 


Ao(t)  *  (t)  for  ^  ^ 

Note  that  f(t)  <  r©  for  all  t  >  tr^  implies  tf(i)  ^  tr^  for  all  t  >  tr,,.  Therefore,  Af(t)(t) 
is  well  defined  for  all  t  >  tr^:  that  is,  t  is  in  the  domain  of  Af(t)  for  all  t  >  tr^.  We 
claim  that  A©  is  a  nondecreasing  function.  The  proof  is  by  contradiction:  if 
A©(a)  >  A©(b)  for  some  a  <  b  in  the  domain  of  A©,  then,  by  definition  of  A©,  we 
have  Af(a)(a)  >  Af(b)(b).  Since  S'  is  nonincreasing  on  (0,  oe),  with  the  aid  of 
Equation  A-4  we  conclude 


f(a)  =  a  S’(Art,)(a))  ^  a  S'(A£(b)(b))  <  b  S’(Artb)(b))  =  f(b)  , 
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which  contradicts  the  fact  that  f  is  a  decreasing  function.  Note  that  Um 

I-»oo 

tS'(Ao(t))  =  lim  tS’(Af(t)(t))  =  lim  f(t)  =  0.  /  /  /  / 

Proof  of  Lemma  4.1.  Fix  j  e  {  0,  1,  2,  m)  and  w  c  X.  Assume  w  satisfies 
Inequalities  4.2  and  4.3.  Let  zj »  Sm  (Lw(xj))  -  SmfLwfxo))  and  let  zjk  denote  the 
component  of  zj  (1  <.k^m).  Set  zjo  *  2  and  zj(m+l)®  0-  Equation  4.1  gives 


r  1  1  1  “ 

T^(xj)  ~  Yj  —  Yq  +  ^  Y  ^  ~  yj  ~  Yo  ^  ^jk  ~  yk-i^  ~  yj 


1  in 

"2^  yk^^jk  ^  ^(k+1))  y/^jj  ~  ^(i+1)  ~  if  j  <  in 


(A-5) 


m-l 

yk(Zjk  -  Zj(k+1))  +  ym(Zmm  “  2) 


ifj  =  m  . 


Hence,  it  suffices  to  prove  Inequalities  A-6  through  A-9  for  1  j  m.  Note  that 
the  interpolation  through  (xq.  Yo)  is  exact. 


(2-zjil  <2Si 

(A-6) 

V 

1 

1 

25i 

for  1  ^k<m,  k^j 

(A-7) 

1  *"  2 1 

<38i 

ifj<m 

(A-8) 

IZjml  <5i 

if  j<m 

.and  lzniin-2l  <25i  . 

(A-9) 

Gearly,  Inequalities  A-5  through  A-9  imply  Inequality  4.4. 
Since  v(xj  -  Xk)  i  0  for  1  k  <  j.  Inequality  4.2  implies 
wkv(xj  -  xk)  +  Ak(wk)  >  a  for  1  i  k  ^  j  . 
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Therefore,  by  the  choice  of  a,  the  component  of  Sni(Lw(xj))  is  within  8i  of  1  for 
1  <  k  <  j;  that  is. 


1  -  6i  <  S(wij  v(Xj  -  Xjj)  +  A,j(wjj))  <1  for  1  <  k  <  j 


(A-10) 


Since  v(xj  -  x^)  <  v(xic.i  -  xk)  for  0  <  j  <k<  m.  Inequality  4.3  implies 

Wk  v(xj  •  xk)  +  Ak(wk)  <-a  forO<j<k<m. 

Therefore,  by  the  choice  of  a,  the  k^  component  of  Sni(Lw(xj))  is  within  5i  of  -1  for 
0  j  <  k  <  m;  that  is, 

-  1  <  S(wk  v(xj  -  Xk)  +  Ak(wk))  <-l  +  5i  forO<j<k<m  .  (A-11) 


The  definitions  of  p,  Lw.  and  zj  give 


I  Zjk  ~  I  ^  I  Sjn  I 

|sjLw(Xo))k-Sm(Lw(Xo))k+il  forl^k<m  ,  (^-12) 


where  Sin(Lw(xj))k  denotes  the  k***  component  of  SmCLwfxj))  for  all  k  and  j. 

The  second  term  on  the  right-hand  side  (RHS)  of  Inequality  A- 12  is  less  than 
5i  for  1  <  k  <  m  as  Inequality  A-11  with  j  =  0  shows.  The  first  term  on  the  RHS  of 
Inequality  A- 12  is  less  than  5i,_as  shown  by  Inequality  A-10  when  1  ^  k  <  j  and 
by  Inequality  A-11  when  j  <  k  <  m.  This  proves  Inequality  A-7. 

Next,  note  that 


I  Zjj  —  Zj(j+i)  —  2 1  ^  I  Sm(L^(xj))j  —  S^(Lw(xj))jf  1  2 1  + 
I  Sni(l-«w(Xo))j  ~  Sjn  (L^(X(,))j+i  I 


(A-13) 


As  before,  the  second  term  on  the  RHS  of  Inequality  A-13  is  less  than  5]  if  j  < 
m.  By  Inequality  A-10  with  k  =  j  and  Inequality  A-11  with  k  =  j  -t-  1,  one  concludes 
that 


2  -  25i  <  Sin(Lni(xj))j  -  Sin(Lw(xj))j.fl  <  2  , 

37 


NWC  TP  7191 


which  shows  that  the  first  term  on  the  RHS  of  Inequality  A- 13  is  less  than  25 1. 
This  proves  Inequality  A-8. 

Consider  now  zji: 

0  <  2  -  zji  =  2  -  (Sin(Lvv(xj))i  *  Sm(Lw(Xo))i]  <  2  -  (1  -  5i)  +  (-1  +  8i)  , 

where  we  used  Inequality  A-10  with  k  =  1  and  Inequality  A-11  with  j  =  0.  This 

proves  Inequality  A-6. 

If  j  <  m,  then  Inequality  A- 11  implies 

I  ^jm  I  ~  I  I  <  . 

If  j  =  m,  then  Inequality  A-10  with  k  =  j  =  m  and  Inequality  A-11  with  j  =  0  and  k  = 
m  give 

0  >  Zitm  *  2  =  Sm  (Lw(Xin))m  *  S|n(I-w(Xo))m  -  2  >  (1  -  5l)  +  (1  -  8l)  -  2  =  -  25i  , 

Therefore  Inequality  A-9  holds.  This  completes  the  proof  of  Lemma  4.1.  //// 

Proof  of  Lemma  4.2.  If  Tw(x)  is  given  by  Equation  4.1  then,  by  Equation 

2.1, 


[T'w  (xk)lij  =  ^  Yi  S'ni(Lw(xk))wvj  (0  <  k  <  m,  1  <  i  <  ^,  1  <  j  <  n)  , 

where  Yi  denotes  the  i***  row  of  the  matrix  Y  (1  <,i<.^  and  vj  denotes  the  j*** 
component  of  v  (1  ^  j  <  n).  This  leads  to 

[r^(x]t)]ij  =  J  (yji  -  yu-i)i  wji  S'Cw^^  vCiq^  -  x^) + a^Cw^))  vj  . 

Hence,  if  suffices  to  show 

0  <  w^  S'(wji  v(xk  -  Xji)  +  A(i(wji))  <  52  (l^li<ni,  0<k£.m)  .  (A-14) 
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Inequality  A- 14  reduces  to  Inequality  4.5  when  k  =  4(1^4i  m),  and  it  is 
implied  by  Inequality  4.5  when  k  >  4  (1  S.  4  <  m)  because  S'  is  nonincreasing  on 
(0,  00).  Finally,  since 

w^v(Xk  -  Xji)  +  A^(w^  <  w^  v(x^.i  -  Xji)  +  A^(w^)  when  k  <  4 

and  S'  is  nondecreasing  on  (-«»,  0),  Inequality  4.6  implies  Inequality  A-14  for  k  < 
4  (1  ^4^  in)-  Note  that  we  used  the  fact  that  S  is  an  odd  function  (which  implies 
S'  is  even).  /  /  /  / 
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